A framework to analyze opinion formation models

Comparing model predictions with real data is crucial to improve and validate a model. For opinion formation models, validation based on real data is uncommon and difficult to obtain, also due to the lack of systematic approaches for a meaningful comparison. We introduce a framework to assess opinion formation models, which can be used to determine the qualitative outcomes that an opinion formation model can produce, and compare model predictions with real data. The proposed approach relies on a histogram-based classification algorithm, and on transition tables. The algorithm classifies an opinion distribution as perfect consensus, consensus, polarization, clustering, or dissensus; these qualitative categories were identified from World Values Survey data. The transition tables capture the qualitative evolution of the opinion distribution between an initial and a final time. We compute the real transition tables based on World Values Survey data from different years, as well as the predicted transition tables produced by the French-DeGroot, Weighted-Median, Bounded Confidence, and Quantum Game models, and we compare them. Our results provide insight into the evolution of real-life opinions and highlight key directions to improve opinion formation models.


Simulations to produce Transition Tables
The World Values Survey results are assumed to be representative of the corresponding communities: for each country, question, and survey wave, if a different number of people in the same country were asked the same question at the same time, the resulting histogram would be a simple re-scaling of the original one (with the same 'shape'). Hence, not only we can re-scale histograms so as to evolve opinion formation models with more or less individuals than the real survey answers, but also we can think of the answers in different survey waves, for the same country, as given by the same people (which is not the case in reality). This allows us to compare predicted and real opinions. The predicted transition table from Wave K to Wave K + 1 is computed following these steps: 1. set the number of individuals for the model simulation, N; 2. take an initial histogram (corresponding to a particular country and question) from Wave K, then scale it so that the total sum of bin counts is equal to N and each bin count is an integer; 3. transform the scaled histogram into a set of N initial opinions: if bin k has n elements, then n individuals are assigned an initial opinion equal to the middle value of bin k; 4. generate the graph G , over which the opinions evolve, as a strongly connected small-world network; 5. assign each initial opinion to a node in the graph; 6. evolve the opinions, with the given initial conditions, over the generated network graph, according to the update rule of the chosen opinion formation model, for T time steps; 7. classify the final opinions; 8. repeat steps 2 to 7 for every question and every country; 9. based on the qualitative classification of all the initial (real) and final (predicted) opinion distributions, construct the predicted transition table.
The above steps require four seemingly arbitrary choices that could change the outcome of the simulations: (i) the number N of considered individuals, (ii) the graph topology G , (iii) the initial opinion assignment, and (iv) the number T of time steps for the model simulation. The possible changes due to the number of individuals, the small-world network topology, and the initial opinion assignment were evaluated through an extensive simulation campaign. Models with N = 100, N = 500, and N = 1000 individuals were simulated. For each choice of N, 5 different small-world network topologies were generated, and for each network topology 5 different random initial opinion assignments were considered (thus, for each choice of N, 25 systems were evolved). For each network topology and random assignment of initial opinions, the transition table was computed, resulting in a total of 75 different transition tables (for each considered model). These transition tables were then averaged (entry by entry). For each entry, the variability is expressed as the difference between its maximum and minimum value across all 75 tables. The results are presented in Tables 10 and 11. Concerning the number of time steps, the French-DeGroot and Bounded Confidence models were evolved for 50 iterations: since in these models every individual can change opinion at each time step, every individual had the opportunity to change opinion 50 times. In the Weighted Median model, only one individual changes opinion per each iteration, therefore for this model we considered 5000, 25000, and 50000 iterations for the graphs with 100, 500, and 1000 individuals respectively, so that on average each individual could change opinion 50 times and the results were comparable. We allowed 50 opinion changes because of the type of questions, which involved opinions on values, and of the time interval between real survey waves, 5 years on average: opinions were then allowed to change at most 10 times per year. For more 'trivial' questions, or in a quickly changing opinion environment, more iterations could be allowed. It is important to stress that the results may vary depending on the total number of times an individual is allowed to change opinion: even a model known to always asymptotically lead to perfect consensus can produce a different distribution if it evolves for a short time. However, in our simulations we observed that the French-DeGroot model and the Bounded Confidence (r = 0.7) model reach consensus after few iterations, while the Bounded Confidence (r = 0.1) model leave the qualitative opinion distribution unchanged even over a very long simulation horizon, hence these models show a very small sensitivity with respect to the number of time steps.

French-DeGroot Model
The weights w i j were chosen from a uniform random distribution U(0, 1) and then normalized so that the corresponding adjacency matrix is row stochastic. Regardless of the chosen weights, if the graph is strongly connected, then the opinions (evolving according to the French-DeGroot model) will asymptotically converge to a common value, leading to perfect consensus. However, it is possible to construct ad-hoc strongly connected networks so that the results can vary. One could for instance divide the vertices of the network in two groups and decrease the weights of edges between groups such that 'virtually' two strongly connected components are formed, despite the fact that they are technically one single strongly connected component. Under these circumstances, the opinions will first evolve towards polarization and then very slowly converge to perfect consensus. Thus, if one were to stop the simulation before the opinions have converged to consensus, the result will be interpreted as polarization. Hence, the results presented in the paper will not hold for any possible digraph. However, they hold for most of the digraphs (and they hold almost always for randomly generated digraphs). Furthermore, the analysis and conclusions drawn from the results hold for any possible choice of the weights, namely: (i) there is a strong bias towards consensus not present in real life, and (ii) there is no mechanism to go from perfect consensus or consensus to polarization, clustering or disagreement.

Weighted-Median Model
As before, the weights w i j were chosen from a uniform random distribution U(0, 1) and then normalized so that the corresponding adjacency matrix is row stochastic. The results presented in the paper can change if different weights are chosen: in fact, due to the stochastic nature of the model, even with the same initial conditions the results may vary. Furthermore, in this case there are no closed-form theoretical results that predict the asymptotic value of the opinions. Despite the lack of theoretical results, looking at the foundations on which the model is built it is possible to conclude that it produces polarization, clustering, or dissensus with significantly low probability. The model is based on cognitive dissonance theory (in which agents tend to minimize their disagreement with their neighbours) and, as such, the opinions evolve with a strong bias towards agreement. Therefore, although the numerical results may vary with different digraphs, the conclusions and analysis are the same regardless of the choice of the weights.

Bounded Confidence Model
For this model the choice of weights w i j depends on the digraph. If Nˆi is the set of in-neighbours of agentî, then the weights wˆiˆj are the same for allĵ ∈ Nˆi and equal to |Nˆi| −1 . This is done to guarantee that the corresponding adjacency matrix is row stochastic at every time step. Therefore, the only parameter of the model is the confidence radius. In principle, every agent could have different confidence radius, and even an "asymmetric" confidence radius depending on whether the opinion of the other is larger or smaller. So, the results presented in the paper can change if different weights are chosen, but not significantly. At its core, the Bounded Confidence model can be seen as the evolution of subgroups of agents (the "strongly connected components", depending on the agents opinions and their confidence radius, whose composition evolves over time depending on how the opinions change), whose opinions evolve simultaneously according to the French-DeGroot model. Therefore, the number of final opinions is the same as the number of final strongly connected components. The smaller the confidence radius, the larger the number of different final opinions. Of course, if the confidence radius is too small, then the opinions remain essentially unchanged, as shown by model BC1.
Hence, unlike the French-DeGroot model, for some choice of the parameters the Bounded Confidence model can produce polarization, clustering, and dissensus; however, these outcomes result from dividing the population in small groups, each converging to a common opinion. It is not an active separation of opinions, but a passive division. Like the French-DeGroot model, this model also lacks mechanisms to produce polarization, clustering, and dissensus starting from consensus and perfect consensus. Furthermore, the absence of consensus for a non-trivial confidence radius evidences the bias towards perfect consensus that is inherited from the dynamics of the French-DeGroot model.

Result and Analysis Significance
For different choices of the parameters (weights or confidence radius), the results presented in Tables 10 and 11 may be slightly different. Nevertheless, the analysis and conclusions drawn from these tables will be conceptually the same. Although the models could produce polarization, clustering, and dissensus for some particular networks (chosen ad-hoc) and for short enough evolution times, still the models lack the mechanisms that would allow them to produce such outcomes starting from consensus or perfect consensus. Furthermore, their inherent bias towards perfect consensus shapes the dynamic opinion evolution regardless of the chosen parameters.
ii. Download the .R files.
iii. Using R, convert the .R files to .csv.
iv. Read the .csv files in MATLAB and extract the survey answers to the desired questions.
v. Determine all the countries that have answered the desired questions for Waves 5, 6, and 7.
vi. Extract and store the survey answers for each wave, question, and country in cells to be used later.
(b) Create 5 different small-world networks to be the initial digraphs. This is done for networks with 100, 500, and 1000 vertices. The digraphs are strongly connected and are constructed based on the Watts-Strogatz Small-World Graph Model (for more details, see the corresponding subsection).
(c) Using the real answers, generate the scaled initial opinions for 100, 500, and 1000 agents (for more details, see the corresponding subsection).
2. Execute the main code: This simply consists of the execution of the DataProcessing_Wx_My.m files, where x is the wave index (1 or 2) and y is the method index (1 to 5). Each .m file will create a .mat file called Tables_wave_x_method_y.mat (where x and y are as before) which contains the main results.
3. Interpret the results: Execute the ResultsScript.m, which will generate .tex files including the transition tables.

Step by step instructions to replicate the results
Important: The threshold value denoted by T 2 in the paper is called alpha in all the MATLAB scripts. Also, due to the random shuffling of initial conditions, the results may be slightly different even when starting with the same initial opinions and initial network.
14. Run the MATLAB script InitialConditionCreation.m. This script will create the initial networks and initial opinions by executing the scripts InitialNetwork.m and InitialOpinions.m respectively.
15. Run the DataProcessing_Wx_My.m scripts for x=1,2 and y=1,2,3,4,5. These scripts evolve each of the initial conditions for Wave 5 (x=1) and Wave 6 (x=2), for each one of the considered opinion formation models (y=1, ..., 5). These scripts are identical, except for y=2 where the total time variable is different, because model 2 is the Weighted-Median model, where only one individual updates its opinion at each time, therefore more time steps are needed for each individual to update on average 50 times. The output of each of these scripts is the .mat file Tables_wave_x_method_y.mat, which contains the 75 predicted transition tables for Wave x and method y.

4/18
16. Run ResultsScript.m. This script takes the real data from RealDataHist.mat to create the real transition tables and also the predicted data from Tables_wave_x_method_y.mat to compute the average predicted transition tables that appear in the manuscript. Finally, it creates a .tex file that displays the tables.
Note: by executing steps from 1 to 16, the results obtained may differ from the ones presented in the paper. If the networks and initial conditions are different, then there may also be some differences in the results. This does not affect the conclusions drawn from the paper. Moreover, it is possible to replicate the results obtained in the paper if steps 1 to 13 are skipped and instead the initial conditions provided in the datasets 8 to 25 are used. By doing so, the final results will be the same as those reported in the paper, which are also in the datasets 26 to 35.

Data extraction from the World Values Survey results
The first step is to obtain the survey answers from the World Values Survey results. To do this first it was necessary to find the ID of all the desired questions for each wave (See Tables 13, 14, and 15). Then execute the script RealHistAnalysis.m which is explained in Algorithm 1. for For country b ∈ C do 7: Select the answers to question a given in country b. 8: Save the answers in a cell 9: end for 10: end for 11: end for 12: Save all the answers divided by wave, question, and country in the .mat file RealDataHist.mat

Small-World Network Creation
The initial networks were generated by the InitialNetwork.m script, detailed in Algorithm 2. It takes only the number of vertices the network has N.

Algorithm 2 InitialNetwork.m script
Require: N 1: Determine randomly the number of connections C, rewiring probability coefficient γ, and bidirectional probability coefficient γ 2: Execute the SmallWorldNetwork.m script to create a Small-World directed network. 3: Assign positive weights from a uniform distribution to the directed network. 4: Normalize the weights such that the corresponding adjacency matrix is row stochastic.
The Small-World Network was created based on the Watts-Strogatz model, using Algorithm 3. It takes the parameters N, number of agents; C, number of connections; β , rewiring probability coefficient; and γ, bidirectional probability coefficient.

Initial Opinion Creation
Given a set of K answers to a Likert-10 scale question, denoted X (i.e. X ∈ {1, 2, . . . , 10} K ). This set is transformed into initial opinions for N agents using Algorithm 4. The set of final opinions is denotedX.

Algorithm 4 InitialOpinions.m script
Require: N, X 1: Compute the type of opinion distribution that the histogram of X is. Denote this type by InitialType. for The initial small-world network G j with N vertices, where j = 1, 2, 3, 4, 5 do Shuffle randomly the initial opinions. Assign initial opinion i to agent i in the digraph for i = 1, . . . , N.

6:
Compute the final opinion distribution type FinalType.

Datasets
These are all the data and code used to obtain the results in the paper.
• F00007944-WV5_Data_R_v20180912.rds: Raw data for the survey results in Wave 5 obtained from the World Values Survey.
• WV6_Data_R_v20201117.rdata: Raw data for the survey results in Wave 6 obtained from the World Values Survey.
• WVS_Cross-National_Wave_7_csv_v2_0.csv: Raw data for the survey results in Wave 7 obtained from the World Values Survey.
• Wave_5.csv: Wave 5 results used by the script RealHistAnalysis.m to produce the data in RealDataHist.mat.
• Wave_6.csv: Wave 6 results used by the script RealHistAnalysis.m to produce the data in RealDataHist.mat.
• Wave_7.csv: Wave 7 results used by the script RealHistAnalysis.m to produce the data in RealDataHist.mat.
• RealDataHist.mat: Data processed from the World Value Survey created by the RealHistAnalysis.m script.
• Tables_wave_1_method_3.mat: The 75 predicted transition tables obtained with method 3 (Bounded Confidence with confidence radius 0.1) from wave 5 to wave 6.
• Tables_wave_1_method_4.mat: The 75 predicted transition tables obtained with method 3 (Bounded Confidence with confidence radius 0.3) from wave 5 to wave 6.
• Tables_wave_1_method_5.mat: The 75 predicted transition tables obtained with method 3 (Bounded Confidence with confidence radius 0.7) from wave 5 to wave 6.
• Tables_wave_2_method_3.mat: The 75 predicted transition tables obtained with method 3 (Bounded Confidence with confidence radius 0.1) from wave 6 to wave 7.
• Tables_wave_2_method_4.mat: The 75 predicted transition tables obtained with method 3 (Bounded Confidence with confidence radius 0.3) from wave 6 to wave 7.
• Tables_wave_2_method_5.mat: The 75 predicted transition tables obtained with method 3 (Bounded Confidence with confidence radius 0.7) from wave 6 to wave 7.                Is People receive state aid for unemployment. an essential characteristic of democracy? Use this scale where 1 means "not at all an essential characteristic of democracy" and 10 means it definitely is "an essential characteristic of democracy" 26 156 135 245 Is The army takes over when government is incompetent. an essential characteristic of democracy? Use this scale where 1 means "not at all an essential characteristic of democracy" and 10 means it definitely is "an essential characteristic of democracy" 27 157 136 246 Is Civil rights protect people's liberty against oppression. an essential characteristic of democracy? Use this scale where 1 means "not at all an essential characteristic of democracy" and 10 means it definitely is "an essential characteristic of democracy" 28 161 139 249 Is Women have the same rights as men. an essential characteristic of democracy? Use this scale where 1 means "not at all an essential characteristic of democracy" and 10 means it definitely is "an essential characteristic of democracy" 29 162 140 250 How important is it for you to live in a country that is governed democratically? On this scale where 1 means it is "not at all important" and 10 means "absolutely important" what position would you choose? 30 163 141 251 And how democratically is this country being governed today? Again using a scale from 1 to 10, where 1 means that it is "not at all democratic" and 10 means that it is "completely democratic," what position would you choose?